TSNLP - Test Suites for Natural Language Processing
نویسندگان
چکیده
The growing la.ngua ha. e, produc(~d SlLt)sl;~tntiztl ( i .e . lm'ger tho,ll &lly; cx i s t , ing gOllcra l Lest s u i t e s ) n~u|l,i-purl~osc mid iliu]i;i-user LesL suites fin' three I~hlropea.n lm~guages l;ogel,]ler wil, h a set of Sl)(;cialized tools tha t t'acilil,;~te th(! conS|;l'llC|;iolt~ (~XI,CIISiOII~ lll~-till|;(HlallCC~ r(!I.ri(!val~ ?tll(| c l t s | ;oni izal ; iot l o[' l;he l;(~sl, d~ti;a. The pul)li<:ly avMIMfle resull.s of TSNLI' l'(!l)reseiii; a wduM)lc linguisLic rosourc(~ l;hai; has l;[i('~ pot;enl, iM of l)roviding ~t widc-sl)r(utd I)re-sl,an(lard diagnost, i(: a.nd (w;thla.LiOll i;ool fl)r bol;h (l(woh)p(!rs and users (ll' NI,P al)l)licai;ions. ] B a c k g r o , m d a n d M o t i v a t i o n E w d u a t i o n of NI ,P ;q)l) l icat ions p lays mi im:reasing ly iml)or tanL role in b o t h (;he a(:adtnni(: mtd in(lusl;ria.1 NI, (:onmumiti(~s. T w o tools t;raditiona.1ly used for (~va.hial;iug an(l l:(~sl:ing N],] ) syst(!ms .%1"C [,CS[, S'II, iI, C,'~ ;I,11(l [,cat (:o'I7)o7YI,. ~['11o I;wo (:;VII 1)(; seen as se rv ing (:Olnt)hmw~nLary 1)url)oses (see l ) ,mphi i l el; al. (1995a)) : in (:onl,ras~ 1;() tex(, ('or1)()ra, whose nmiu a.(lva.ntage is l;ha% they r(;lle,(:t n a t m ' n l l y ()(:curring (lal,;~, l;h0 key 1)rop(~rti(~s of |;(!,ql; suil;cs a,re (i) syst, em(d/i(:ity, (ii) co'~d,'rol ov(:'r d(tl, a, (iii) i'r~,clv, siou, o f ne:l(tl, ivc (l(da, ~tn(l ( iv) c.:dt,(t'.,stivity. I'l'tm la'oj~cl was ~tm'l,(~d in I)(~(:(md)er 1993 and COml)leted in M;u(:h 1!)9(;, Most of t,h(~ I)roj(~ct results (do(:um(mts, bibli()gral)hy , tcsL data, ;rod sofl;wa,'( 0 as well as on-lium ~tcccss to (;he l,('sl, suil,(~ (laJ,a, ba>;(? ~ul(l clnai] ~uhh'ess(!s of l,h(! proj(~(;l; nmntlmrs c;-tll I)(! o])rained through I,h<~ world.wide web fronl the TSN1A' holIl(~ 1)~1,1!~'.! 'ht i;p : / / t s n i p . d f k i . mtJ s b . d e / t s n l p / ' . Tim 'FSNIA' [)l'Oj(~(;I; WaS ftllld(~(I wi l ,h in l, hc I , inguis l , ic I{csna,rch I 'h@tw(wiiq,2 (IAIE) [)FOt.,I'?iAIIIII(! (11" 111o [~;lll'O p(~Ul ( ' ,Omln i s s ion (i)<; X I[I) und ( ! r i'es¢,arc}l gra, ilL I,HE-. (12-089 and by l, lw S w i s s l:(~(l(~rM (',OV(!,'lllilCili. <> ( I I~IMT ( fro u p a, A eros p ntial(, lq'am:c Uuiversity of Essex (':OllllllOll l{.esearch Conter Wivmdm(! Ibu'k 12, rm~ Pasl;eur I~P 76 17K (?oh:hcster (X)4 3SQ 1<' !)2152 Suresnes (Ic¢lex -I-441206-872086 q 331-46973061 A m o n g the ma.ii~ mol;iw~.tions for 1,he TSNIA' proj(~(:t wore the lack of gone, ra] gu ide l ines for the t(;sl; su i te c o n s t r u c t i o n , of adeqmvte a,nd compre h(;nsiv(~ tes t mnte r i a l , a n d of al)prol)rial;(~ tools. T h e r e su l t i ng dup l i c a t i on of effort a m o n g tes t .~uito d(welopers obv ious ly le~tds t,o a was te of t ime a n d resour(:os. In addi t i (m, one of the m a i n conc lus ions of a, s t u d y of e x i s t i n g t;esl;s suil;es COlMUCtCd d u r i n g the first: sLage of the p ro j ec t (Esl:iva.l el; al. ( ]994)) was t h a t l;he. r (msabi l i ty of exisi;ing test suiLos is severe ly hm n l ) e r ed l)y the, it l;mk of s t r u c t u r e a.ud a, nn()ta.t;ions. Iudeed , despit(~ th(; pioiw(' ,r ing ('fl'orl;s o f Fl ick inger et al. (1987) ,~/,lld Nc, rbonn(! (;1; al. (1993), mos t of the, exis t ing tes t su i tes were writtren for some specific, syste, m or s imply (municrat, c a nm n l ) e r of int(we, st;ing example s and , thus , do not, niee, t th(! demarlM for large, sysLelna, tic, wt~ll-doclnnelll;t?(l, h ighlysl;rut:l;ure<| mid m m o t a t e d col lec t ions of l inguis t ic matexia.1, which is now requ i red by a. g rowing n u m 1)er of NIA ) apt ) l ica t ions . T h e ' F S N I P Ix;st su i te addresses these d e n m n d s ml(l provides I)owerflfl l;ools for l;hc consl;rll(:l;ion ; tnd m : m i p u l a t i o n of l;}m l;(~sl; (|aJ;a,. O n the on0 ha,ml, sinco (;very NLP sysLtun (wheLlmr conmmrc iM or und( 'x devo lo l )nwnt ) ('xhil)its specific fea.l;m'es which lnak(; it un ique , a n d every user (or dcvelol)er) of mt N L P sysl;(~m has sI)(~citi(: ne(;ds a n d i 'equirt;nmnl.s, the TSNI,I' ~tl)-l)r()ach is l)a,sed on tlm a~Smnl) t ion l,h;d;, in ()r(l(w to yield informa.l;ive a n d int(wl)retal:)h~ resul ts , a,ny 1;eSl; suil;e used ['or ml ac tua l {;(;sl; or e v a h i a t i o n lmlSl Ioe sp('.ci,,/i(: (~d, loa.st 1;() some (h',gr(w~) to i;hc sysLem a n d the user. ( )n the o the r }mrml, sin(:o t e s t ing or ewdua. t ing N[ ,P sy s t em s is 1)crfornmd ti)r a. var ie ty of ]mrl)oscs , t,h(', TSNI,I > a.l)l)roach is also gui<l(;d l)y I;h(', n(;ed to l)rovide tes t mat(!r ia l which is easi ly .l'('/tts(l, Dl(',. rib achiove th(;se two goMs of Sl)(~cili(:ity and reusalf i l i ty , the t ra .d i t ional n o t i o n of a. l,est su i te as a monol i f l f i ( : set: o f tes t it;olns has l)(~(!n M)andoned in fnvour of the n o t i o n of a (tal;al)as(~ in which tes t iLelliS ;tl'(~ sl;ored l:ogether wi th a, rich invenl;ory of asso(: iated l iuguist i( : mM n(m-linguist;i(:
منابع مشابه
Test Suites for Quality Evaluation of NLP Products
Test suites are a useful evaluation tool for developers and users of NLP products. The paper gives an overview of the tsnlp design and methodology and describes how the tsnlp data and methodology can be used in practice to provide a reliable assessment method of the linguistic capabilities of NLP products.
متن کاملTowards Systematic Testing and Diagnosis Integrating tsnlp and alep
A recent addition to the alep grammar engineering platform is described: the test suite apparatus and test data produced in the tsnlp project have been seamlessly integrated with the alep task executor. The resulting test suite extension to alep is well-suited to substitute for the existing naive testing environment, greatly increases testing and report generation exibility and xes several (pre...
متن کاملTowards systematic grammar profiling.Test suite technology 10 years after
An experiment with recent test suite and grammar (engineering) resources is outlined: a critical assessment of the EU-funded tsnlp (Test Suites for Natural Language Processing) package as a diagnostic and benchmarking facility for a distributed (multi-site) large-scale hpsg grammar engineering effort. This paper argues for a generalized, systematic, and fully automated testing and diagnosis fac...
متن کاملA Test Suite for Inference Involving Adjectives
Recently, most of the research in NLP has concentrated on the creation of applications coping with textual entailment. However, there still exist very few resources for the evaluation of such applications. We argue that the reason for this resides not only in the novelty of the research field but also and mainly in the difficulty of defining the linguistic phenomena which are responsible for in...
متن کاملTest suites: some issues in their use and design
Evaluation has always been a subject of interest to the MT community. It has also been a source of grief, as witnessed by the damning ALPAC Report (see Pierce and Carroll, 1966). This report led to the virtual end of government funding for MT in the USA in the sixties since it concluded that there was no immediate prospect of MT producing useful translation of general scientific texts. However,...
متن کاملMarkup of a Test Suite with SGML
Recently, there have been various attempts to set up a test suite covering the syntactic phenomena of a natural language (cp. [Flickinger et al. 1989], [Nerbonne et al. 1993]). The latest e ort is the TSNLP project (Test Suite for Natural Language Processing) within the Linguistic Research and Engineering (LRE) framework sponsored by the European Union (cp. [Balkan et al. 1994]). These test sui...
متن کامل